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Abstract 

Wc develop an online network tomography algorithm using a stochastic approximation variant of the Kaczmarz (SAK) 
algorithm. Under standard assumptions, we show that this algorithm converges almost surely. In fact, we show that starting 
from the same initial point, our SAK algorithm and the corresponding deterministic Kaczmarz algorithm converge to precisely 
the same point. Numerical examples via Matlab based simulations demonstrate the efficacy of the SAK algorithm. 
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1 Introduction 

Network tomography is infcircncc of spatially localized 
network behaviour using only end-to-end aggregate met- 
rics. Recent work can be classified into traffic volume and 
link delay tomography. A basic paradigm in both these 
is to infer the statistics of the random vector X from an 
ill-posed measurement model Y = AX, where the ma- 
trix A is assumed to be known a priori. See [9,7,12] for 
excellent surveys. 

In the transportation literature, the aim is to estimate 
the traffic volume on the end-to-end routes assuming ac- 
cess to only traffic volumes on a subset of links. Such 
problems are addressed in, among others, [13,4,15,16]. 
An excellent survey is given in [1] . An analogous problem 
has been addressed in packet networking, e.g., [11,18,17]. 
In all of these works, one sample of Y is assumed avail- 
able and X is estimated by a suitable regularization. 

Link delay tomography deals with estimation of link de- 
lay statistics from path delay measurements. Here the 
network is usually assumed to be in the form of a tree. 
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Multicast probe packets, real or emulated, are sent from 
the root node to the leaves. For each probe packet, a 
set of path-delay measurements from the root node to 
the leaf nodes is collected. These delays are correlated 
and this correlation is exploited to estimate the link de- 
lay statistics from the end-to-end measurements (e.g., 
[6,2,10,14]). Using T independent samples of the path 
delay vector, an expectation maximisation based algo- 
rithm is derived to obtain the maximum likelihood esti- 
mates of the link parameters. 

We mention here that there is also work on estimating 
link level loss statistics, link level bandwidths and link- 
level cross traffic from end-to-end measurements. There 
has also been progress in determining the topology of 
the network using end-to-end measurements. 

In this paper, we first show that, starting from the same 
initial point, the stochastic approximation variant of the 
Kaczmarz (SAK) algorithm and the usual determinis- 
tic Kaczmarz algorithm converge to precisely the same 
point. Using this, we develop a novel online algorithm for 
estimation of the moments and certain cross-moments 
of the elements of the vector X from a sequence of mea- 
surements of the elements of the vector Y. Our scheme 
can thus be used for both traffic volume and link delay 
tomography. Note that our algorithm is not 'one-shot' 
in that we do not collect a set of samples of the elements 
of Y and then estimate the above mentioned statistics 
of X. Rather, wc use the sequence of samples of the ele- 
ments of Y in real-time to iteratively update the desired 
estimates. In contrast to previous approaches, the SAK 
algorithm has the following positive features. 
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• The input required is only a sequence of independent 
samples of the elements of Y. 

• The entries of A can be arbitrary real numbers. 

• The elements of X are allowed to be correlated. 

• The desired estimates are obtained in real-time as a 
sequence of updates of the estimates. 

Note that for link delay tomography, as the first two 

featiircs siiggest, our algorithm does away with the need 
for multicast probe packet measurements and can be 
used even for networks with topologies other than tree. 

The paper is organized as follows. Section 2 formally de- 
scribes the notations and the central problem. Wc re- 
view some results from the theory of stochastic approx- 
imation in Section 3. This section also describes the de- 
terministic Kaczmarz algorithm. We develop our esti- 
mation scheme and its extensions in Sections 4 and 5 
respectively. We end by giving a numerical example in 
Section 6 and concluding remarks in Section 7. 

2 Model and problem description 

Our notations are as follows. We use E to denote ex- 
pectation. For n E N, we use [n] to represent the set 
{!,..., n}. The notation C([0, oo); M") represents the set 
of continuous functions x : [0, oo) R". For a vector 
w e R", we use u' to denote its transpose and to 
refer to its usual Euclidean norm. We use the subscript 
notation Ui to denote its z*'* component. We use the su- 
perscript convention t'' to refer to the fc*'' element of the 
sequence {t"}n>o- For the matrix A, the notation TZa 
indicates its row space. Lastly, we use x{t) to denote the 
derivative of the map x with respect to t. 

Let X = {Xi , . . . , Xjv)' denote the random vector whose 
statistics we wish to estimate. Let A e R'"^^, m < N, 
be an a priori known matrix with full row rank and let 



Y={Y,,...,Y^y = AX. 



(1) 



Let Z he a. random variable taking values in [to] such 
that, Vz G [to]. Pr{Z = i} ~: Xi > 0. Independent 
of everything else, let {.^"}„>i be a sequence of IID 
copies of Z. We assume that at each time step fc > we 
know the value of and can acquire an independent 
sample, denoted Yzk+i, of the scalar random variable 
Yz = 1 Yil{z=i}- Note that here I denotes the indi- 
cator random variable. The choice of notation Z'''^^ and 
Y^fc+i is to highlight the fact that these random vari- 
ables are independent of everything that has happened 
before time step k. 

Our objective is to develop a real-time algorithm, 
with provable convergence properties, to estimate 
E(XJ/ • • • Xf^^), ni, . . . , nfc > 0, whenever 3i e [to] 
such that ttij^ , cLijk sxe simultaneously non-zero. In 
particular, we wish to estimate E(X^), j e [N], k>0. 



3 Preliminaries 

This section summarizes relevant results from the theory 
of stochastic approximation and also describes the usual 
deterministic Kaczmarz algorithm. 

3.1 Stochastic approximation algorithms 

Consider the iterative algorithm I in R" given by 



x'' + i]''h{x''), 



(2) 



where /i : M" — R" and {7?'^}fe>o is a chosen stepsize 
sequence. The algorithm I' given by 



= x'' +'q''[h{x'')+C''+'^] 



(3) 



where ^'^^^ represents noise, is known as the stochastic 
approximation variant of the original scheme X. Observe 

that as 7]'^ 0, (3) can be viewed as a noisy discretiza- 
tion of the limiting ordinary differential equation (ODE) 



x{t) = h{x{t)). 



(4) 



This suggests that by analyzing (4) one may be able to 
predict the asymptotic behaviour of I'. This approach 
is described next. 

Let t° = and t^ = Y!1=o v',^ke N. Let 

, , \ x^ iit = t^ 

x{t) = { k (5) 

Also, let x'^{t), t > s, denote the solution to (4) starting 
at s with x^{s) = x{s). 

We assume the following conditions for analysis. 

(Al) The map /i : R" ^ R" is Lipschitz continuous. 

(A2) The stepsize sequence {?7*'}fe>o is positive and sat- 
isfies i) Y^kLo 11^ = 00 and ii) Y^kLoiv f < 00. 

(A3) {S,'^} is a martingale difference sequence with re- 
spect to the (7-fields{J"'^}, where J"'^ := o-(.x",^\ . . . ,^'^). 
Also, V/c > 0, ^'^^^ is square integrable and almost 
surely £;[||^'=+i||2| J-^] < L(l + [jx^H^) for some L > 0. 

(A4) There exists e C([0, oc); M") such that Vu e 
R", h{cu)/c — > hoo{u) as c ^ 00. Also, the ODE x{t) = 
hoo{x{t)) has origin as its sole globally asymptotically 
stable equilibrium. 



Lemma 1 [5] Almost surely, 



lim sup \\x{t)-x\t)\\=Q,'iT >Q. 

te[s,s+T\ 



(6) 
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This lemma demonstrates that asymptotically x{t) mim- 
ics the behaviour of the ODE given in (4). 

We end this section by giving a stronger version of the 
above result when there also exists a Liapunov function, 
£ : R" ^ M. Let us suppose that H := {x € W : 
C{x) = 0} is a finite set. 

Lemma 2 [5] The iterates {x^} almost surely converge 
to a possibly sample path dependent point in H. 

3.2 Kaczmarz algorithm 

Consider the inverse problem of estimating a fixed v* G 
M.^ from the measurement Av* , where A € K"*^^ and 
m < N. Without loss of generality, let us suppose that 
A has full row rank and its row vectors have unit norm. 
Given an initial approximation a;° oiv* , a natural opti- 
mization problem to consider is 

min llti — a;°|| s.t. Au = Av* . (7) 

Elementary calculation shows that its solution is 

X* =x^ + A'{AA')-^A{v* -x"^). (8) 

Clearly, x* & A° := + 7^^. As ^ has full row rank, 
X* is the only point in that satisfies Au = Av* . The 
Kaczmarz algorithm uses this fact to solve (7). With a 
prescribed initial point a;'', its update rule is given by 

x^+^ =x'' + r] - aR^k) (9) 

where R{k) = [k mod m) + 1, is the j*'* row of A and 
?7 is the stepsize of the algorithm. 

Theorem 1 [8] IfO < < 2, then x^ ^ x* ask ^ oo. 

The condition on xP to ensure x* is a close enough ap- 
proximation of V* is discussed next. Let A* := v* + TZa- 
Since A°, A* are translations of the subspace TZa, we 
have dist(^°,^*) = dist(a;°,^*) = dist(^°,t;*). Now 
the fact that A{x* — v*) = shows that x* — v* is 
perpendicular to TZa and hence to A^ and A*. Thus, 
\\v* - x*\\ = dist(ylo,w*) = dist(a;0,^*). The following 
lemma is now straightforward. 

Lemma 3 For any ^ > 0, — v*\\ < S if and only if 

dist{x°,A*) < S. 

4 The SAK Algorithm 

In this section, given an initial approximation x^, we 
develop a SAK algorithm to estimate EX for the model 
given in Section 2. We also compare its behaviour against 
that of the usual deterministic Kaczmarz algorithm. 



Observe from (1) that 

EY = AEX. (10) 

By rescaling equations, if necessary, we ensure that the 
rows of A are of unit norm. Observe then that if we 
know EY exactly then the problem of determining EX, 
given the initial approximation x^, is precisely of the 
form given in (7). We can thus directly use the usual 
Kaczmarz algorithm. This, as (9) says, would be 

j.k+1 = a-fe + ^ (EYn^k) - a'R^k)^'') aR{k) (H) 
and it would converge to 

:c* = x° + A'{AA')-^{EY - ^a;°). (12) 

But recall from Section 2 that, for each i G [m], we have 
access only to IID samples of the random variable Yi . So 
the next best thing one can do is to obtain an estimate £ 
of EY such that \ \£ — EY\ \ is small with high probability 
and then use (11) with EY replaced with £. Clearly, this 
approach can work only after a sufficiently large number 
of samples of each Y^ have been collected; thus ensuring 
that the estimation is offline in nature. 

A better alternative is to use a SAK algorithm. Using 
the notations and assumptions made in Section 2, a SAK 
algorithm to estimate EX, inspired by (11), is 

= j-fe + r]^\Yzk+i - a'zk+ix'']azk+i, (13) 

where {rj'^} is a stepsize sequence satisfying the assump- 
tion (^2) of Subsection 3.1. We emphasize here that (13) 

• makes direct use of the noisy measurements {Yzk+i} 
which ensures that it is real-time, and, 

• does away with making measurements of the compo- 
nents of y in a cumbersome round-robin fashion. 

We now study the asymptotic behaviour of (13). Note 

that if A'^ := .t" + TZa, then, as with (11), the iterates 
{x*^} of (13) always remain confined to A° . But yueA°, 
the full row rank condition of A ensures that there is a 
unique a e M"* such that u = x° + A' a. Instead of (13), 
one can thus equivalently analyze the algorithm 

^k+i = ^fc + - ezk+iA{x° + A'a% (14) 

where a'^ — 0, e^t+i is the m x m, matrix with 1 in its 
Z^'^^—th. diagonal position and zero elsewhere and Y^'^^ 
is the m— dimensional vector with its Z'^^^—th. position 
occupied by Yzk+i and zero elsewhere. 

Note that, for each fc > 0, a'= e and 

x'' = x'^ + A' . (15) 
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Let 7*^+1 = [f*+i - ezk+iA{x" + A' a'')]. Defin- 
ing ^^=+1 = 7'=+! - A (EF - A{x° + A'a'')) , where 
A = diag(Ai, . . . , Am), note that (14) can be rewritten as 

^k+i ^^k^^k - A{x° + A'a'')) + ^^^+1] . (16) 

This is clearly in the form given in (3) with 

h{u) = A{EY- A{x^ + A'u)) . (17) 

Its limiting ODE is thus given by 

a{t) = A{EY- A{x° + A'a{t))) . (18) 

Theorem 2 Let a* := {AA')-'^{EY - Ax°). If each 
has finite variance, then, almost surely, — )■ a* as 
k ^ 00. 

Proof For each fc > 0, let J"*' := a{a°, C\ • • • , C'')- Our 
first aim is to show that the conditions (^1)-(j44) as 
defined in the Subsection 3.1 hold true for (14). 

Since h of (17) is an affine function, (Al) clearly holds. 
Assumption (A2) follows from our definition of stepsize 

sequence {r]''}. For (^3), note that E(^'=+i| J"'') = 0. 
Thus l^*^} is a martingale difference sequence with 
respect to {7"'=}. Also, < Fi + Fa + F3, 

where Fi ^ \\Y''+'^ - AEY\\, F2 = \\ezk+iAx° - 
A Ax°\\ and F3 = \\ezk+iA A'a'' - A AA'a''\\. Thus, 

V^TllF+TI^ < ELi VWfWl- Now the fact 
that each Xj and consequently each Yi has finite vari- 
ance implies that 3Li > 0, independent of k, such that 
-y/E"[lYi^?fe] < Li. Since takes values only from 

[m], a finite set, it also follows that y^E [T^lTk] < L2 
and y'E [FglJ^^] < L3||a'°|| for some positive constants 
L2 and Z/3, again independent of k. Arguing separately 
when 1 1 a'' 1 1 < 1 and ||q!''|| > 1, it follows that {A3) 
also holds. Next note that if hc{u) := h{cu)/c and 
hoo{u) := —AAA'u, then hc{u) — >■ /loo(u) as c — > 00 
pointwise. Also, note that if £oo{u) := u'AA'u, then 
^oo{u) > Vu e IR™ with equality only at the origin. 
Also, any solution to the ODE a{t) = —AAA'a{t) sat- 
isfies £00 («(*)) = -2a{t)AA'AAA'a{t) < 0, again with 
equality only at the origin. These imply that £00 is 
a Liapunov function for the ODE a{t) = —h^{a{t)). 
Hence, the origin is its sole globally asymptotically 
stable equilibrium. Thus (^4) is also true. 

Because of Lemma 2, one way to verify that a*^ — )• a* 
as fc — >■ 00, almost surely, is to show that a* is the 
unique globally asymptotically stable equilibrium of the 
ODE given in (18). Towards this, consider the function 
£(u) = ||A'(u-a*)||2.As^has full row rank, £(u) =0 
if and only if w = a*. Next note that along any solu- 
tion a{t) of (18), Ciait)) = 2{a(t) - a*yAA'a{t). But 
a{t) = -AAA'{a{t)-a*). Thus£(a(i)) < with equal- 
ity only when a{t) = a* . This shows that £ is a Liapunov 



function at a* . Thus a* is the sole globally asymptoti- 
cally stable equilibrium of (18) as desired. □ 

Because of (15), the following result is now immediate. 

Corollary 1 If each Xj has finite variance, then, almost 
surely, the SAK algorithm of (13) converges to x* of ( 12). 

Let A* = EX+TZa and let us suppose that using domain 
knowledge one can obtain an initial approximation x^ of 
EX such that dist(x°,^*) < 5. Using Lemma 3, it then 

follows that the SAK algorithm can be used to accurately 
estimate EA". From the same lemma, however, it also 
follows that if dist(a;''. A*) is large, then the limit of the 
SAK algorithm will be far away from EX. 

We end this section by summarizing the proposed 
method to estimate EX, given an initial approximation 
a;*^, in an algorithmic fashion. 

Algorithm 1. The SAK Algorithm 

(1) Choose an arbitrary positive number, say fi. 

(2) For each fc > 0, 

(a) SETp= 

(b) SET (i = ^2..+!. 

(c) SET x'^+i =x'' + l[{d- a'px'')ap\ . 
5 Extensions 

Observe that, for any g € N and each i e [m], 

where r = (ri, . . . , tat), A^v.„ = {r e : Y,j = l} 
and ( « ) = , , . Let X^ denote the (^;>t'^7^) 
dimensional vector (O^i -^j^ ■ ^ ^ ^N,k) and let Y'^ = 
{Yf,...,Y^). Also, let A" denote m x matrix, 

where A^j is the coefficient associated with j*'* term of 

X« as given in (19). Clearly, the set of relations in (19) 
can be compactly written as 

Y" = A^X". (20) 

By dropping dependent rows if necessary, we ensure that 
the matrix A"^ has full row rank. Note then that (20) is 
of the same form as (1). One can thus reuse Algorithm 1, 
after replacing samples of Y^ with those of Y^' and A 

with A"^, to estimate in real-time E ^IljLi ^J^) ^'^^ 
r e An.q- For desired r, the only condition one needs 
to ensure is that 3i € [m] such that, whenever rj ^ 0, 
Afj ^ 0. Clearly, by choosing appropriate q in (19), one 
can estimate the moments of any desired order. 
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Fig. 1. Network for simulation experiment 

Remark: Given estimates of finitely many mo- 
ments, one can postulate a maximum entropy dis- 
tribution with given values of these moments by a 
standard procedure. For example, given estimates 
w a, w b, maximum entropy distri- 
bution is p~"'^ea;p (— (A||a;|p + , where p is the 
normalizing factor and A, fi are chosen so as to ensure 
E[\\Xr]=a,E[\\Xr]^b 

6 Experimental Results 

We illustrate the efficacy of the method that we have de- 
veloped above with a numerical example of delay tomog- 
raphy for the network of Figure 1. We use Algorithm 1 
to estimate in real-time the moments and some cross 
moments of delay experienced by probe packets across 
network links using only end-to-end delay measurements 
across selected paths. 

The setup is as follows. We use six paths with nodes 1 
and 2 as the starting nodes and nodes 3, 4, and 5 as the 
end nodes. The delay a probe packet experiences while 
traversing link j is a random variable Xj with arbitrary 
non-negative distribution. The paths across which the 
delay measurements can be made is given below. 



1110000000000 
1001000110000 
1001000100011 
0010110001000 
0000011110000 
0000010001110 



A = 



while the columns correspond to links. The element Aij 
of this matrix is 1 precisely when the j*^ link is present 
on the i*'' path. It is easy to see that the delay across 
path i, denoted Yi, is given by Yi — {AX)i. 

The experiment is as follows. We generate a million 
probe packets where the n*'' packet is sent along a path 
whose index, say Z", is chosen randomly from the index 
set {1, . . . , 6}. Thus each path gets about 167,000 sam- 
ples. For each packet, we record the delay, say Yz" , that 
it experiences. In parallel, we run our Algorithm 1 with 
a million iterations first for the linear relation of (1) and 
then for (19) with q = 2. 



The chosen start point, the actual value and final esti- 
mated value of moments are given in Tables 1 and 2. 
Note that in both cases the initial point satisfies the as- 
sumption of Lemma 3 and hence the final estimates are 
close to the actual values. In Table 2, note that we give 
only a subset of results. Figure 2 shows the real-time es- 
timates of expected delay for candidate links 1, 4 and 5. 
Observe that although we run the simulation for a mil- 
lion packets, estimates are very nearly the true values in 
after about 100 iterations for links 1 and 4. 

Table 1 

True and Estimated value of Expected Link Delay 



Linkid 


start Point 


True Value 


Estimate 


1 


14.5895 


50.0650 


49.5900 


2 


0.0000 


26.3106 


23.0207 


3 


50.7452 


41.9718 


43.2525 


4 


0.0000 


9.0966 


11.9798 


5 


58.0881 


23.0625 


27.5747 


6 


12.0012 


48.0923 


48.4476 


7 


9.6104 


41.5553 


39.7147 


8 


8.9746 


49.9272 


51.0588 


9 


0.0000 


34.7536 


35.3955 


10 


0.0000 


3.7842 


6.3421 


11 


0.0000 


44.1365 


36.8555 


12 


0.0000 


48.6276 


43.5441 


13 


20.1407 


29.0983 


26.8294 



Table 2 

True and Estimated value of some 2nd Order Moments 



The rows of the path-link matrix A correspond to paths 



Moment 


start Point 


True Value 


Estimate 


E{Xi) 


17388 


20539 


20570 


E{X'i) 





277.85 


286.29 


E(X?2) 





9143 


9160.1 


E(X3Xio) 


15985 


158.83 


164.34 


E(X8Xi2) 


-126 


2427.8 


2390.5 



5 



200 



100 



-100 




10 10 
(a) Link 1 



200 



100 



-100 




10 10 

(b) Link 4 



200 



100 



-100 




10 10 
(c) Link 5 



Fig. 2. Convergence of real-time estimates of expected delay (solid) to its actual value (dotted). The iterations (horizontal 
axis) are plotted using a log scale to illustrate the initial drift and the eventual settling down of the estimates (vertical axis). 



7 Discussion 

We presented a simple, online, iterative network tomog- 
raphy scheme based on the stochastic approximation 
variant of the Kaczmarz algorithm. Although our algo- 
rithm was motivated by network tomography problems, 
we believe that it is useful in its own right. Using this 
SAK scheme, we described the method to obtain online 
estimates of the marginal and the joint moments of link 
level parameters using only end-to-end path measure- 
ments. Recall that the analysis above was performed un- 
der the hypothesis of square-integrability of the martin- 
gale difference noise. If noise is heavy-tailed, then the 
square integrability condition is violated and our the- 
ory is not applicable. Recent extensions of stochastic ap- 
proximation theory to such noise processes, see [3] , pro- 
vide appropriate counterparts of the results of [5] that 
enable ns to handle this situation under fairly general 
hypotheses. We do not pursue this here. 
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